Developing a technique for the automatic analysis of surveillance videos inorder to identify the presence of violence is of broad interest. In this work,we propose a deep neural network for the purpose of recognizing violent videos.A convolutional neural network is used to extract frame level features from avideo. The frame level features are then aggregated using a variant of the longshort term memory that uses convolutional gates. The convolutional neuralnetwork along with the convolutional long short term memory is capable ofcapturing localized spatio-temporal features which enables the analysis oflocal motion taking place in the video. We also propose to use adjacent framedifferences as the input to the model thereby forcing it to encode the changesoccurring in the video. The performance of the proposed feature extractionpipeline is evaluated on three standard benchmark datasets in terms ofrecognition accuracy. Comparison of the results obtained with the state of theart techniques revealed the promising capability of the proposed method inrecognizing violent videos.
展开▼